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An  analyaia  of  the  conceptual  Jeaeibility  of  using  automatic  speech  recognition 
j  and  understanding  technology  in  the  design  A  an  advanced  training  system  was  con- 
|  ducted.  The  analysis  specifically  explored  application  to  Ground  Controlled 
?  APPr°acb  (GCA/  controller  training.  A  systems  engineering  approach  was  followed 
|  to  ^..ermine  the  feasibility  of  such  a  system.  Design  features  were  developed 
'  including  training  requirements  ana  constraints.  An  evaluation  of  the  state -of-the- 
j  a-4  of  speech  understanding  systems  was  conducted. 


ihe  results  of  the  study  indicate  that  the  technology  for  automatic  speecn  recog 
;  ni^ion  and  understanding  is  adequate  to  warrant  design  and  construction  of  a  feasi- 
J  DUicy  demonstration  model  for  the  precision  approach  radar  (PAR)  phases  of  GCA 
controller  training.  As  conceived,  the  system  would  accept  student  controller 


,  #  '  - JT  ---WW.AV  ^-W**W*  ^ 

|  3?eech  “a  COilvert  it:  to  functional ’commands' to  drive  the  simulated  radar  return  of 
j  n  aircraft  on  a  display  in  a  manner  much  a6  the  controller  guides  an  actual  aircraft 


i  f.  .  ,  -  - - - ****  iircrau 

•  or‘  ane’~  approach  to  landing.  Other  design  features  would  include  objective  perform-1 
ance  measurement  and  adaptive  syllabus  control  for  increasing  the  difficulty  of  the 
training  problem  as  a  junction  of  the  student’s  performance.  It  ie  recommended 
'•finical  iaasioility  demonstration  be  implemented. 
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FOREWORD 


This  report  describes  the  first  steps  in  a  program  to  develop  techniques 
for  improving  quality  control  while  reducing  costs  in  the  training  of  Navy 
personnel  in9 certain  of  the  skills  needed  to  perform  control  jobs  such  as 
Ground  Controlled  Approach  Controller,  Ground  Controlled  Intercept  controller, 
and  Radar  Intercept  Officer.  These  jobs  have  in  common  the  need  to  use  re¬ 
stricted,  stylized  speech  to  guide  a  recipient  man-machine  system  to  a 
well-defir.ed^goal .  Application  of  the  advanced  technology  01  machine  under¬ 
standing  of  spoken  commands  in  combination  with  the  previously  developed 
technologies  of  automated  adaptive  training  should  make  it  possible  to 
realize  important  savings  in  manpower  and  other  energy  resources. 

The  outcome  of  the  initial  effort,  as  reported  in  this  document,  is  a 
functional  desiqn  for  an  experimental  laboratory  version  of  an  advance 
technology  controller  training  system  for  GCA  Controllers.  This  design  is 
being  implemented  for  use  in  the  Human  Factors  Laboratory  of  the  Naval  > 
ing  Equipment  Center.  When  installed  the  experimental  system  will  be 
subjected  to  a  period  of  test  and  evaluation  to  determine  the  technical 
feasibility  of  the  concept. 
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SECTION  I 


INTRODUCTION 


In  1 96 9 ,  the  Naval  Training  Equipment  Center  (NAVTRAEQUIPCEN) 
initiated  a  program  to  demonstrate  that  the  effectiveness  of  training  devices 
can  be  increased  by  the  application  of  advanced  technology.  Of  specific 
interest  were  the  rapidly  developing  technology  in  computer  sciences  (es- 
r ecially  programming)  ar.d  in  psychology  (adaptive  training  and  performance 
measurement).  This  study  is  directed  to  advances  in  automatic  speech 
recognition  (ASR)  technology.  ASR  has  developed  from  an  interdisciplinary 
approach  which  includes  computer  sciences  and  psychology  as  well  as  other 
areas  such  as  linguistics,  phonetics,  and  artificial  intelligence.  Speech 
understanding  encompasses  the  identification  and  understanding  of  human 
speech  sounds  by  machine. 

This  is  the  final  report  of  a  project  designed  to  establish  the  conceptual 
feasibility  of  exploiting  these  technologies  for  training  communications  skills. 
Adaptive  training  and  objective  performance  measurement  have  been  demon¬ 
strated  earlier.  H)  The  major  impetus  of  this  report  was  the  study  of  ASR 
technology  and  the  analysis  of  the  feasibility  of  a  complete  training  system. 
The  training  requirement  selected  for  demonstration  was  that  of  developing 
the  skills  and  vocabulary  needed  by  GCA  controllers  for  final  approach 
control. 

The  report  includes  a  functional  design  for  a  precision  approach  radar 
(PAR)  training  system  which  uses  ASR  technology  to  recognise  controller 
commands.  After  recognition  (identification),  the  commands  are  converted 
to  functional  outputs  reflecting  the  input  of  the  student  controller.  The  func¬ 
tional  system  includes  other  advanced  training  techniques  such  as  automated- 
adaptive  training. 

Such  a  system  should  reduce  the  variability  introduced  by  the  many  man¬ 
ual  functions  inherent  in  existing  training  equipment.  The  system  also  pro¬ 
vides  objective  measures  of  performance  ano,  by  incorporating  adaptive 
techniques,  allows  each  student  to  progress  through  the  program  at  his  most 
effective  rate. 


*  r-^r'e3»  J •  P*  ,  Johnson  R.  M.  and  Swink,  J.  R.  Automated  Flight  Training 
'AFT'  GCI/CIC  Air  Attack.  Technical  Report  NAVTRAEQUIPCEN  7Z-C- 
0108-1,  July,  1973.  Naval  Training  Equipment  Center.  Orlando,  Florida. 
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SECTION  n 

STATEMENT  OF  THE  PROBLEM 

GENERAL 

The  successful  combining  of  equipment  and  techniques  into  one  functional 
system  cannot  be  based  solely  on  the  success  of  individual  or  separate  tech¬ 
nical  demonstrations.  A  period  of  conceptual  analysis  preceding  technical 
implementation  is  required  to  determine  the  feasibility  oi  combinations. 

A  major  problem  is  in  the  'nterfacing  of  system  components.  It  is  especially 
critical  when  the  technologies  represented  are  from  widely  separated  ana 
distinct  disciplines.  Therefore,  the  importance  of  isolating  and  analyzing 
conceptual  feasibility  increases  with  system  complexity  and  particularly  with 
rapidly  advancing  technology.  Conceptual  feasibility  must  be  established, 
either  analytically  or  empirically,  before  the  demonstration  of  technical 
feasibility  can  be  undertaken  with  reasonable  confidence. 

STATEMENT  OF  THE  PROBLEM 

The  present  study  is  primarily  an  analysis  of  conceptual  feasibility  of  an 
advanced  system  for  GCA  controller  training.  It  required  exploring  the  pos¬ 
sibility  of  combining  automatic  speech  understanding  with  automated -adaptive 
training  to  form  an  advanced  computer-based  training  system.  A  compre¬ 
hensive  review  of  speech  understanding  techniques  consumed  a  major  portion 
of  the  early  analyses,  since  the  literature  was  scattered  throughout  several 
academic  and  applied  disciplines.  Other  portions  of  the  study  were  dedicated 
to  the  application  of  these  techniques  and  to  the  development  of  a  functional 
design  for  the  proposed  system.  Laboratory  investigations  of  1)  speech  timing 
patterns  and  L)  feasible  improvements  to  current  speech  recognition  devices 
were  conducted. 

Implementation  of  a  demonstration  model  of  the  complete  system  and  its 
teat  and  evaluation  remain  to  be  accomplished. 
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SECTION  in 
METHOD 

A  systems  engineering  approach  was  taken  to  ensure  an  efficient  and 
thorough  evaluation  of  conceptual  feasibility.  Of  particular  importance  were 
the  analyses  of  PAR  training  requirements  and  the  state-of-the-art  necessary 
to  establish  the  feasibility  that  advance!  technology  could  be  applied  to  train¬ 
ing  requirements.  In-depth  analysis  of  these  factors  was  necessary  to  pro¬ 
vide  the  study  with  sufficient  scope  to  be  representative  of  a  typical  training 
problem  as  well  as  predictive  of  feasibility  and  applicability  of  the  application. 

The  project  began  with  a  constraint  and  training  requirement  analysis 
and  concluded  with  a  system  functional  design.  Fourteen  study  tasks  were 
identified  as  shown  in  Figure  1.  The  first  five  are  oriented  toward  definition 
of  the  problem  while  Tasks  6  through  10  are  oriented  toward  system  require¬ 
ments.  The  last  four  (Tasks  11  through  14)  are  system  functional  design 
tasks. 


NAVTRAEQUIPCEN  73-C-0079-1 


SUS  -  SPEECH  UNDERSTANDING  SYSTEM 

PAR  -  PRECISION  APPROACH  RADAR  SYSTEM 


Figure  1.  Flowchart  of  Study  Tasks 
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SECTION  IV 
RESULTS 


This  section  reviews  the  tasks  outlined  in  figure  i.  The  results  led  to 
the  final  design  specification.  In  all  case3,  the  results  refer  to  a  'demon¬ 
stration  system'  design  in  contrast  to  an  operational  training  system  design. 
The  requirements  for  the  demonstration  system  are,  however,  representative 
of  fully  implemented  systems  and  stress  the  risk  areas  that  require  actual 
demonstration  of  technical  feasibility.  The  first  tasks  completed  in  the  study 
were  the  detailed  definition  of  the  training  problem  including  analyses  of  con¬ 
straints,  training  requirements,  and  available  technology. 

TASK  1  -  DEFINE  CONSTRAINTS 

The  first  task  included  defining  constraints  for  the  study  and  for  the  ad¬ 
vanced  training  system.  Constraints  on  the  study  were: 

a.  Time  frame  -  The  study  was  constrained  to  1  year  in  which  both 
a  survey  of  existing  speech  recognition  technology  and  a  feasible 
system  functional  design  were  to  be  completed. 

b.  Current  speech  technology  —  Since  the  goai  was  an  applied  training 
system  using  advanced  but  available  equipment,  the  study  precluded 
design  or  development  of  a  new  recognition  device.  However,  it 
did  not  rule  out  feasible  improvement  to  existing  devices. 

Constraints  on  the  training  system  originated  from  three  sources: 

a.  Training  syllabus  characteristics. 

b.  Equipment  characteristics  and  limitations, 

c.  Implementation  limits,  both  in  terms  of  supporting  equipment 
and  students. 

These  constraints  were  identified  as  follows: 

a.  Vocabulary  and  phraseology  should  be  limited  to  the  precision 
approach  phase  of  GCA  controller  training,  as  defined  by  the 
Syllabus  for  the  Navy  GCA  Controller  Training  Course,  FAA 
Instruction  7110,  8C,  and  standard  Navy  approach  procedures. 
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b.  Simulated  approaches  (student  problems)  should  be  standard  PAR 
approaches  (surveillance  and  no-gyro  approaches  n'je d  not  be  pro¬ 
vided  in  the  demonstration  since  they  are,  in  effect,  a  simple  sub¬ 
set  of  the  precision  approach). 

c.  Existing  speech  understanding  technology  requires  draining1  of 

the  system;  i.  e. ,  each  person  using  the  system  speaks  each  phrase 
into  the  system  5  to  10  times.  The  system  stores  individual  speaker 
characteristics  and  adjusts  the  recognition  threshold.  The  student 
must  try  to  use  the  same  speech  patterns  (e.  g.  ,  intonation,  inflec¬ 
tion,  stresses,  and  pauses)  during  training  that  he  will  use  in  sys¬ 
tem  operation  or  test. 

d.  The  demonstration  system  should  be  compatible  with  conventional 
computer  systems  such  as  the  PDP-9  computer  system  at  NAVTRA- 
EQUIPCEN  to  be  implementable. 

TASK  2  -  ANALYZE  PAR  FUNCTIONS 

On  each  final  approach,  the  PAR  controller  must  perform  the  following 
functions: 


a.  Provide  published  decision  height,  if  requested. 

b.  Issue  glidepath  intercept  notification. 

c.  Issue  instruction  to  begin  descent. 

d.  Issue  glidepath  advisories, 

e.  Issue  course  advisories. 

f.  Issue  range  advisories. 

g.  Issue  decision  height  advisory. 

h.  Issue  clearance  information. 

i.  Issue  surface  wind  information. 

j.  Issue  position  advisories. 

k.  Issue  instructions  to  execute  missed  approach. 

i.  Counter  system  malfunctions  by  corrective  action, 
m.  Issue  ’handoff1  instructions. 
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To  discharge  these  13  functions  properly,  the  PAR  controller  can  use 
approximately  80  basic  phrases  specified  in  FAA  Instruction  7110.  8C.  These 
are  listed  in  appendix  A  along  with  the  criteria  lor  issuing  each  advisory.  A 
time -line  analysis  for  a  typical  PAR  approach.  wa3  conducted  from  recorded 
approaches  made  at  Naval  Air  Station  Miramar  and  is  presented  in  appendix  B. 
It  can  be  seen  that  transmissions  on  final  approach  occur  with  less  than  a  5- 
second  pause,  and  have  a  definite  rhythm. 

TASK  3  -  REVIEW  PAR  CONTROLLER  TRAINING  SYLLABUS 

The  GCA  controller  training  syllabus  was  reviewed  to  ensure  that  devel¬ 
opment  of  the  demonstration  system  would  be  compatible  with  actual  opera¬ 
tional  training  requirements.  As  part  of  the  review,  several  Navy  facilities 
were  visited,  including  the  GCA  school  at  the  Naval  Air  Technical  Training 
Center  (NATTC),  Glynco,  Georgia  and  the  GCA  unit  at  the  Naval  Air  Station 
Miramar,  California.  Navy  personnel  were  interviewed  regarding  any 
details  of  training  not  stressed  in  the  syllabus  but  which  were  operationally 
important.  GCA  controllers  were  also  observed  conducting  approaches, 
and  audio  tape  recordings  were  made  of  final  approaches.  These  observa¬ 
tions  and  recordings  provided  important  additional  information.  The  con¬ 
trollers  were  particularly  helpful  in  providing  message  timing  and  priority 
information. 

The  data  were  U3ed  to  identify  training  functions.  The  basic  training 
requirements  identified  included  developing  the  following  vocabulary  and 
Tjrocedural  skills. 

a.  Use  of  GCA  vocabulary. 

b.  Use  of  standard  and  approved  PAR  procedures. 

c.  Control  of  aircraft  under  various  weather  conditions. 

d.  Control  of  aircraft  with  different  flight  characteristics. 

e.  Control  of  aircraft  with  various  pilot  characteristics. 

f.  Control  of  aircraft  with  system  malfunctions  or  emergencies. 

g.  Control  of  aircraft  under  various  traffic  loads. 

TASK  4  -  REVIEW  SPEECH  UNDERSTANDING  TECHNOLOGY 

To  determine  the  1  state -o.' -the -art'  of  automatic  speech  technology,  rele¬ 
vant  literature  in  the  fields  of  computer  science,  engineering,  and  linguistics, 
from  early  1950  to  the  present,  were  reviewed.  Mejor  emphasis  was  given 
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to  the  period  from  1965  to  the  present  s*nce  most  of  the  technical  advances 
have  occurred  in  this  time  frame. 

The  library  search  uncovered  over  150  references  to  automatic  speech 
recognition,  analysis,  synthesis,  and  understanding.  A  further  search  by 
the  National  Technical  Information  Service  (i’TIS)  yielded  100  references, 
althougn  some  duplicated  the  'academic'  literature.  The  extensive  bibliog¬ 
raphy  developed,  which  contains  over  200  entries,  is  included  in  this  report. 

A  concurrent  review  of  industry  capabilities  revealed  that  at  least  four 
companies  presently  build  or  are  actively  planning  to  build  speech  recognition 
devices  or  equipment.  A  review  of  the  devices  is  presented  in  appendix  C. 

TASK  5  -  DEFINE  PERFORMANCE  REQUIREMENTS 

GENERAL.  Technical  manuals  point  out  that  radar  provides  the  most  pre¬ 
cise  means  of  obtaining  information  to  guide  an  aircraft  through  an  instru¬ 
ment  approach  to  landing.  The  radar  systems  most  commonly  used  are 
called  the  Ground  Controlled  Approach  (GCA)  system.  It  is  composed  of  two 
subsystems:  surveillance  radar  and  a  precision  approach  radar  (PAR).  Both 
subsystems  provide  azimuth  and  range  information.  The  PAR  system  also 
provides  height  information  on  the  landing  aircraft.  The  controller  must 
determine  the  relative  position  of  the  aircraft  and  advise  the  pilot  through 
the  approach.  The  advisories  are  a  set  of  short,  well-defined  phrases 
designed  to  minimize  ambiguity  and  misinterpretation. 

Of  the  two  subsystems,  PAR  presents  the  greater  challenge  to  controller 
training.  This  stems  from  the  fact  that  the  final  approach  controller  must 
determine  and  transmit  glide  slope,  as  well  as  heading  and  range  information, 
with  no  pause  greater  than  5  seconds.  If  the  pause  is  longer,  the  pilot  as¬ 
sumes  a  loss  of  communications  and  executes  a  'missed  approach.  '  Landing 
clearance,  wind,  and  position  advisories  must  also  be  given  at  specific  points 
in  the  approach.  The  major  factors  contributing  to  task  difficulty  include: 

a.  Decreasing  safety  tolerances  with  range. 

b.  Vocabulary  constraints, 

c.  Weather. 

d.  Aircraft  performance  and  type. 
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PRESENT  PAR  TRAINING.  GCA  controller  instruction  is  currently  con¬ 
ducted  at  the  Naval  Air  Technical  Training  Center  at  Glynco,  Georgia.  The 
GCA  training  program  utilizes  training  devices  such  as  Ciustratea  on  fig¬ 
ure  2.  The  student  is  provided  a  simulated  GCA  control  console  which 
includes  communication  equipment.  For  PAR  controller  instruction,  the 
display  presents  azimuth,  elevation,  and  range.  The  student  PAR  controller 
transmits  advisories  to  an  'acting  pilot'  who  'flies'  the  simulated  aircraft. 
Aircraft  position  changes  which  occur  as  a  result  of  the  ‘acting  pilot's'  flight 
response  are  displayed  on  the  student's  radar  display  through  a  video  simu¬ 
lator.  The  instructor  supervises  training  sessions,  subjectively  evaluates 
student  performance,  and  implements  the  overall  training  plan  by  altering 
PAR  conditions  to  present  a  variety  of  problems  to  the  student.  Two  men 
are  normally  required  to  teach  one  student  controller. 

The  typical  existing  training  system  also  lacks  the  following  basic  train¬ 
ing  capabilities: 

a.  Objective  performance  measurement. 

b.  Realistic  aircraft  performance. 

c.  Individualized  instruction. 

d.  Extrinsic  real-time  feedback  for  the  student. 

SPECIFIC  PAR  REQUIREMENTS.  As  already  discussed,  the  PAR  subsystem 
includes  radar  displays  of  range -azimuth  and  range -elevation.  These  dis¬ 
plays  are  used  by  the  controller  to  guide  the  aircraft  on  the  landing  approach 
from  approximately  the  final  approach  fix  to  the  point  where  the  aircraft  is 
over  the  landing  threshold.  The  PAR  controller  uses  specified  phrases  to 
advise  the  acting  pilot  of  his  position  with  respect  to  the  glidepath  and  the 
extended  runway  centeriine.  Each  phrase  has  a  unique  criterion  for  its  use 
(refer  to  appendix  A). 

PAR  training  reauir eiments  involve  both  content  ana  timing  of  messages. 
Content  is  a  matter  of  using  correct  vocabulary  and  phraseology  in  message 
transmission.  Timing  refers  to  when  a  message  is  to  be  sent  and  includes 
both  massage  spacing  and  message  priority.  The  training  problem  is  to 
teach  the  student  what  he  should  say  in  specific  cases,  and  when  he  should 
say  it.  For  example,  when  an  aircraft  radar  return  is  about  two-thirds  above 
the  glidepath  cursor,  the  correct  advisory  is: 

'siightly  above  glidepath,  '  and  is  given  as  message  priority  permits. 
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If  the  ircraft  return  is  completely  above  and  separated  from  the  glide  - 
path  cursor,  the  correct  advisory  is: 

'well  above  glidepath.  1 

Similar  criteria  are  used  for  heading  advisories. 

Under  normal  circumstances,  the  controller  should  achieve  a  7  5  percent 
glidepath  and  25  percent  course  advisory  ratio.  Range  information  (in  miles 
to  touchdown)  should  be  given  once  per  mile,  d  surface  wind  information 
and  clearance  to  land  are  given  at  about  3 -miles. 

Based  on  the  Navy  GCA  Controller  Course  Syllabus  (Lesson  Guide  2.2.4. 1), 
the  controller  is  evaluated  in  terms  of: 

a.  Control  accuracy. 

b.  Clarity  of  instruction. 

c.  Conformance  to  standard  phraseology, 

d.  Conformance  to  standard  procedure. 

According  to  the  syllabus,  "Any  tendency  of  controllers  to  give  erroneous 
advisories  or  instructions  or  to  become  confused  is  considered  a  factor  (in 
the  evaluation  of  competence)  regardless  how  well  the  equipment  works.  Sim¬ 
ulated  emergencies  —  will  be  U3ed  to  determine  controller  competence  in 
emergencies.  " 

Thus,  both  the  format  and  content  of  advisories  as  well  as  procedures 
used  by  the  students  arc  important  and  must  be  considered. 

TASK  6  -  ANALYZE  PAR  CONTROLLER  ADVISORIES 

The  review  of  the  existing  GCA  Syllabus  at  NATTC,  Giynco,  havv,  and 
relevant  FAA  instructions  revealed  approximately  80  basic  phrases  for  PAR 
controlling  (refer  to  appendix  A).  A  detailed  analysis  of  the  advisories  was 
conducted.  It  revealed  several  regularities  which  are  significant  for  auto¬ 
matic  recognition.  They  are: 

a.  The  advisories  can  be  divided  into  three  categories:  glidepath, 
heading  (course),  and  ancillary  information. 
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b.  Advisories  in  each  of  the  three  categories  begin  with  unique 
sets  of  vords  which  are  nearly  mutually  exclusive  (refer  to 
table  1). 

c.  The  syntax  of  phrase  composition  is  highly  regular. 

d.  There  is  considerable  redundance  of  information  within 
phrases. 

TABLE  1.  CATEGORY  DISTINCTIONS  BY  FIRST  WORD 


Glidepath 

Heading 

Information 

Approaching 

Turn 

Cleared 

Begin 

Heading 

Wind 

On 

On 

At 

Going 

Going 

No 

Slightly 

Slightly 

(1  -  6) 

Well 

Well 

Over 

Coming 

Right 

Contact 

Above 

Left 

Below 

Execute 

Note:  Only  'on,  '  'going,  '  'well,  '  and  'slightly'  are  common  to 
different  categories.  By  analyzing  the  second  word  (following 
'well'  and  'slightly'),  the  heading  and  glidepath  phrases  can  be 
distinguished.  'Going'  requires  at  least  two  subsequent  words 
before  the  differentiation  can  be  readily  made. 
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PHONETIC  ANALYSIS.  Mo Bt  approaches  to  automatic  3peec'n  recognition 
attempt  to  analyze  speech  into  phonemic  or  subphonemic  segments.  There¬ 
fore,  a  phonetic  analysis  was  completed  on  basic  or  key  PAR  words  that 
distinguish  phrases  from  one  another.  As  shown  m  table  2,  there  is  little 
similarity  between  phonetic  representations  for  key  words.  Of  those  words 
having  similarity,  none  were  classified  as  minimal  pairs.  A  minimal  pair 
refers  to  two  words  that  differ  by  only  one  phoneme.  Only  the  words  ’head¬ 
ing1  and  ’holding1  contain  as  many  a3  four  identical  phonemes.  The  problem 
was  minimized  when  systems  test  indicated  that  heading  and  holding  could 
probably  be  readily  differentiated  on  other  criteria.  (Most  automatic  recog¬ 
nition  techniques  concentrate  on  vowels,  which  is  precisely  where  the  two 
words  heading  and  holding  differ  most;  i.  e.  ,  e  vs.  ow.  ) 


TABLE  2.  PHONETIC  COMPARISONS  OF  PAR  WORDS 


Phonemes  by  Position 

1 

2 

3 

4 

5 

b 

7  8  9  10 

Above 

a 

b 

8 

V 

Approaching 

a 

P 

r 

o 

w 

A 

C 

Assigned 

a 

s 

s 

a 

y 

n 

d 

At 

as. 

t 

I  Begin 

b 

1 

y 

g 

i 

n 

Below 

b 

i 

y 

I 

o 

w 

|  Cleared 

k 

1 

i 

y 

r 

d 

| 

Course 

k 

0 

w 

2* 

s 

1 

Decision 

d 

i 

y 

s 

i 

A 

S 

3  n 

Eight 

e 

y 

t 

Four 

f 

o 

w 

r 

1 

r  ive 

f 

a 

y 

V 

Glidepath 

S 

i 

a 

y 

d 

P 

et  6 

Going 

g 

0 

w 

i 

Heading 

h 

e 

d 

i 

*3 

i  Holding 

h 

0 

w 

1 

d 

i 

*3 
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TABLE  2.  PHONETIC  COMPARISONS  OF  PAR  WORDS  (Cont) 


PAUSE  ANALYSIS.  A  major  problem  encountered  in  automatic  recognition 
of  speech  is  the  detection  of  the  beginning  and  e*nding  of  an  utterance.  The 
acoustic  analog  of  language  is  a  more  or  less  continuous  function.  Yet,  it 
represents  discrete  units  of  meaning  (words,  riorphemes).  Thus  an  auto¬ 
matic  speech  recognizer  must  be  designed  to  bfreak  (segment)  the  acoustic 
analog  into  these  discrete  units  at  the  proper  places. 

The  approach  typically  taken  is  to  detect  pauses  of  predetermined 
lengtns  (e.  g.  ,  20  milliseconds).  Tne  pause  is  defined  as  a  period  of  time 
in  which  'significant'  acoustic  energy  is  not  present.  The  detection  of  pauses 
is  complicated  by  the  fact  that  some  intrawor<;l  pauses  have  durations  that 
approximate  interword  or  interphrase  pause  lengths.  For  example,  'stops' 
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such  as  b,  p,  t,  and  g  all  have  pauses  created  by  the  constriction  of  the  oral 
cavity.  One  could  mistakenly  segment  the  word  "above"  into  two  words,  'a' 
and  'bove'  because  of  the  bilabial  atop  associated  with  the  'b.  1  Should  this 
happen  in  PAR  controlling,  the  phrase  'slightly  above  glidepath'  would  be 
segmented  into  'slightly  a'  and  'bove  glidepath,  '  neither  of  which  conveys 
the  full  meaning. 

Since  estimates  of  the  durations  of  intraword,  interword,  and  inter  - 
phrase  pauses  for  GCA  controllers  were  obviously  required,  audio  tapes  were 
made  of  PAR  controllers  during  actual  approaches.  Samples  of  these  tapes 
were  then  input  into  a  M?del  606  IB  Spectrum  Analyzer  and  sonagrams  made 
of  the  different  phrases.  (A  sonagram  is  a  visual  frequency- time -amplitude 
display  of  the  acoustical  signal.  )  Figure  3  is  a  sample  sonagram  for  the 
advisory  'two  miles  from  touchdown.  '  Since  the  sonagram  is  a  precise  repre¬ 
sentation  of  sound  over  time,  accurate  calculation  of  pause  durations  was 
possible.  For  the  samples  used,  intraword  pauses  averaged  0.  034  second 
while  interword  pauses  and  interphrase  pause b  averaged  0.  044  and  0.  122 
second,  respectively.  Distribution  analysis  showed  some  overlap. 

These  pause-length  data  were  used  to  discuss  the  PAR  phrc.se -under¬ 
standing  problem  with  manufacturers  of  speech  recognition  devices.  Most 
existing  devices  require  some  sort  of  'training';  i.  e.  ,  analogs  of  the  utter¬ 
ance  of  each  speaker  are  stored  for  later  comparison  in  the  recognition 
process.  The  detection  of  pauses  is  not  considered  to  be  as  difficult  as  it 
wouid  be  if  the  devices  were  designed  for  continuous  free  speech.  Further¬ 
more,  the  interphrase  pause  lengths  are  large  (relative  to  intraphrase  pauses). 
Therefore,  discrimination  of  pauses  should  probably  present  little  or  no 
problem  to  the  proposed  system. 

SPEECH  UNDERSTANDING  ACCURACY.  Speech  understanding  subsystem 
(3US)  output  will  be  utilized  for  three  different  functions.  While  100  percent 
correct  understanding  of  any  student's  advisory  output  is  desirable,  lesser 
accuracy  can.  be  tolerated,  especially  if  a  'bootstrap'  approach  utilizing  re¬ 
dundant  information  in  die  system  is  mechanized  to  support  the  particular 
requirement  for  each  function.  The  three  functions  are:  control  of  the  pilot/ 
aircraft  simulation,  evaluation  of  student's  vocabulary,  and  evaluation  of 
student's  message.  One  of  the  most  stringent  requirements  is  imposed  by 
heading  advisories  which  utilize  numerals  for  heading  control.  However, 
heading  advisories  should  be  confined  to  3-  to  5-degree  heading  changes.  Thus 
ambiguities  or  decision  confusions  can  be  resolved  in  favor  of  the  most  logical 
choice.  This  same  approach  car  be  employed  for  other  functions.  The  pre¬ 
liminary  analyses  have  indicated  redundant  information  will  be  available  to 
aid  in  resolving  recognition  confusions  or  to  re  solve  inconsistencies  between 
speech  understanding  output  and  other  system  data  so  as  to  be  meaningful  for 
training  purposes.  Thus  the  approximately  90-percent  accuracy  acnievabie 
with  atate -of-the -art  speech  recognition  equipment  should  prove  adequate  for 
demonstration  purposes.  Accuracy  should  approach  99  percent. 
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igure  3.  Sonagram  of  the  Advisory  'Two  Miles  from  Touchdown 
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TASK  7  -  DEFINE  SYSTEM  FUNCTION  REQUIREMENTS 

A  function  analysis  was  conducted  an c  reviewed  against  svstem  perform¬ 
ance  requirements.  From  this  analysis,  the  following  function  requirements 
were  derived: 

a.  Automatically  'understand'  approximately  80  PAR  phrases  (each 
phrase  composed  of  not  more  than  nine  words). 

b.  Convert  the  'understood'  phrase  to  a  lorm  usable  for  aircraft 
control. 

c.  Simulate  'pilot  dynamics,  '  various  aircraft  types,  and  environment 
factors  to  create  realistic  training  problems. 

d.  Simulate  the  PAR  radar  display. 

e.  Objectively  measure  and  evaluate  student  performance. 

f.  Provide  on-line  performance  feedback  to  the  student. 

g.  Provide  automatic-adaptive  on-line  syllabus  control. 

h.  Provide  a  hardcopy  printout  of  each  student's  performance. 

i.  Summarize  and  stole  training  data  for  each  student. 

Figure  4  shows  the  functional  flow  of  the  system  for  one  student. 

The  PAR  functions  were  then  analyzed  in  depth  to  isolate  the  training 
task  and  support  functions  required.  As  mentioned  before,  the  Navy  GCA 
training  syllabus  was  reviewed  and  interviews  with  PAR  controllers  were 
conducted  to  establish  the  training  requirement. 

The  major  variables  affecting  PAR  controllers  were  identified  as: 

a.  Wind  factors. 

b.  Type  of  aircraft. 

c.  Pilot  variability. 

d.  Response  lag. 
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Figure  4.  Training  System  Function  Flow 
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c.  Task  loading, 

i.  Type  of  approach. 

The  information  requirements  of  the  controller  were  identified  as 

a.  (jiidepath  cursor. 

b.  Azimuth  cursor. 

c.  Aircraft  return  (targets  or  pips). 

d.  Mile  markers. 

e.  Safety  limits. 

f.  Tower  clearance. 

g.  Published  decision  height. 

h.  Runway  in  use. 


i.  Wind  conditions. 

Figures  5  and  6  depict  the  glide  slope  and  approach  course  geometry  for 
the  PAR  system. 

A  detailed  analysis  of  the  training  steps  involved  was  conducted  to  ensure 
exploration  of  all  concingencies. 

Figure  7  depicts  the  basic  training  session  in  a  first-level  flow  diagram, 
and  identifies  the  major  support  systems  functions  required. 

T \SKS  8  AND  9  -  DEFINE  SPEECH  UNDERSTANDING  SYSTEM  REQUIRE¬ 
MENTS  AND  TRADE  OFF  SPEECH  UNDERSTANDING  SYSTEMS 


A  comprehensive  review  of  the  goals  of  machine  speech  understanding 
development  was  sponsored  by  rhe  Advanced  Research  Projects  Agency  in 
the  spring  of  1970.  The  review  was  reported  by  Newell,  et.  al.  (1/71)  and 
concluded  that  the  following  specification  characteristics  were  a  reasonable 

goal. 
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Student  Functions 


Training  System  Functions 


Student 
arrives  forj 
training 

- 1 


Identify  student  name,  training  back¬ 
ground,  and  training  needs.  Brief 
student  appropriately.  Direct  student 
to  console. 


Brief  student  for  exercises.  Train  SUS. 
Check  readiness. 


Begin  PAR  program  from  teletypewriter. 


Accept  glidepath  instructions.  Accept 
approved  course  and  range.  Accept  run- 
way,  wind,  minimums,  missed  approach, 
and  emergency  information.  Score  pass 
performance.  Structure  course.  Adapt 
difficulty  of  problems.  Handle  contingencies,  # 


Direct  shutdown. 


Direct  student  to  debrief.  Summarize  and 
store  output  data.  Terminate  training. 


Major  contingencies  to  consider: 

a.  Crash. 

b.  Student  fails  to  take  directed  action. 


Figure  7.  PAR  Training  Functions 
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The  system  should: 

(1)  Accept  continuous  speech 

(2)  from  many 

(3)  cooperative  speakers  of  the  general  American  dialect 

(4)  in  a  quiet  room 

(5)  over  a  good  quality  microphone 

(6)  allowing  slight  tuning  of  the  system  per  speaker 

(7)  but  requiring  only  natural  adaptation  by  the  user 

(8)  permitting  a  slightly  selected  vocabulary  of  1000  words 

(9)  with  a  highly  artificial  syntax 

(10)  and  a  task  like  the  data  managem  .nt  or  computer  status  tasks 
(but  not  the  computer  consultant  task) 

(11)  with  a  simple  psychological  model  of  the  user 

(12)  providing  graceful  interaction 

(13)  tolerating  less  than  10  percent  semantic  error 

(14)  in  a  few  times  real  time 

(15)  and  be  demonstratable  in  1976  with  a  moderate  chance  of 
success. 

These  characteristics  are  accepted  as  a  realistic  goal  and  provide  a 
standard  against  which  the  training  requirements  can  be  evaluated. 

After  reviewing  the  Newell  characteristics,  LOGICON  conducted  an 
analysis  of  PAR  advisory  phrases  and  compiled  a  similar  list  of  character¬ 
istics  for  the  speech  understanding  component  for  the  GCA  controller  train¬ 
ing  system.  The  speech  understanding  component  of  the  GCA  controller 
training  system  should: 

(1)  Accept  short  phrases  (nor.continuous  speech) 

(2)  from  many 

(3)  selected  speakers  of  the  general  American  dialect 

(4)  in  a  relatively  noisy  room 

(5)  over  a  good  quality  microphone 

(6)  allowing  training  of  the  system  per  speaker 

(7)  but  requiring  only  natural  adaptation  by  the  user 
(3)  permitting  a  highly  selected  vocabulary  of  approximately 

70  words 

(9)  with  ar.  extremely  invariant  and  orderly  syntax 
( x0)  for  a  GCA  task 

(11)  with  a  simple  functional  model  of  the  pilot  or  aircraft 

(12)  providing  graceful  interaction 

(13)  tolerating  lees  than  5  percent  system  response  error 

(14)  in  near  real  time 

(15)  and  be  ready  for  field  testing  in  1975. 
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Three  of  these  specifications  appear  more  stringent  than  their  counter¬ 
parts  in  the  Newell,  et.  al.  listing.  The  first  of  these,  (3)  above,  refers 
to  basic  speech  training  and  would  involve  control  of  individual  voice  char¬ 
acteristics  such  as  dialect,  timing,  and  patterns  of  inflections.  Although 
limited,  such  control  occurs  initially  by  student  selection.  Controller  train¬ 
ing  itself  develops  a  certain  discipline  in  speech  habits.  GCA  phraseology 
is,  of  necessity,  highly  restricted  and  invariant  when  compared  to  free, 
connected  speech.  For  example,  vocabulary  requirements  for  the  GCA  sys¬ 
tem  are  very  limited  (66  words). 

The  second,  (4)  above,  refers  to  the  ambient  level  of  noise.  The  present 
system  must  deal  with  a  certain  amount  of  interference  from  a  variety  of  re¬ 
lated  equipment  including  radios  and  other  controllers.  While  noise  in  the 
operational  controller  environment  itself  cannot  be  appreciably  reduced,  it 
is  subject  to  some  control  in  the  training  environment.  Fortunately,  the 
speech  analog  itself  is  robust  and  provides  a  certain  resistance  to  distortion. 
Major  problems  would  be  expected  only  if  two  or  more  voices  were  simul¬ 
taneously  picked  up  by  the  microphone,  or  if  ambient  noise  was  of  sufficient 
volume  to  drown  out  the  student's  voice.  These  contingencies  can  be  con¬ 
trolled  with  little  effort  by  use  of  sensitive  and  directional  microphones. 

The  third,  (13)  above,  refers  to  tolerances  for  semantic  error;  that  is, 
errors  at  the  level  of  meaning.  An  allowable  error  rate  of  10  percent  is 
too  high  for  the  voice  system,  since  the  overall  system  response  error  rate 
should  be  5  percent  or  less.  Error  tolerance  was  discussed  briefly  under 
Task  6.  The  Newell,  et.  al.  system  is  based  on  continuous  speech  and  a 
1000-word  vocabulary.  Continuous  speech  in  particular,  and  to  a  lesser 
degree  the  large  vocabulary,  precludes  the  possibility  of  a  secondary  scan¬ 
ning  of  the  material  within  the  specified  real-time  frame.  The  GCA  system, 
concerned  with  short  phrases,  a  highly  selected,  small  vocabulary  within  a 
less  sti  mgent  time  frame,  wiil  allow  for  a  secondary  scan  of  the  speech  signal 
to  be  recognized.  Newell  was  also  concerned  specifically  with  errors  of 
meaning  while  the  training  system  is  concerned  only  with  errors  of  response 
(understanding)  to  a  specified  input.  Therefore,  system  accuracy  greater  than 
95  percent  is  considered  a  reasonable  and  achievable  goal. 

In  summary,  none  of  the  three  major  specification  differences  appears 
to  be  a  limiting  factor  in  training  system  design,  nor  do  any  of  the  other 
minor  changes.  Requirements  for  the  speech  understanding  component  of 
the  training  system  are  much  less  ambitious  than  the  Newell  requirements. 
Therefore,  based  on  vocabulary  and  phrase  requirements,  the  speech  under¬ 
standing  component  of  the  GCA  controller  training  system  appeared  to  be 
within  the  current  state  of  the  art.  To  verify  this,  LOGICON  made  a 
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preliminary  survey  of  companies  building  speech  recognition  devices.  The 
survey  particularly  emphasized  the  following  objectives: 

a.  Recognition  of  phrase3  up  to  4  seconds  in  length. 

b.  Ninety-five  p<  rcent  phrase  accuracy. 

c.  Ability  to  handle  multiple  speakers  and  dialects. 

d.  Minimal  system  training  or  tuning  for  each  different  speaker. 

The  survey  revealed  four  devices  under  development,  three  of  which 
were  operational  at  the  time  of  the  survey.  One  of  the  three  was  selected 
as  the  candidate  for  the  GCA  controller  training  system.  Details  of  the 
survey  and  consequent  trade-off  are  found  in  appendix  C. 

TASK  10  -  DEFINE  GCA  CONTROLLER  TRAINING  SYSTEM 

GENERAL.  The  initial  system  definition  requires  identification  of  high  risk 
or  critical  design  features.  For  the  GCA  controller  training  system,  the 
critical  features  are  those  requiring  a  speech  understanding  or  recognition 
capability.  The  design  requirement  can  be  limited  to  the  PAR  phase  of 
operations  since  it  is  the  limiting  condition.  Simulation  of  displays  and 
training  exercises  need  only  be  functionally  realistic  to  establish  technical 
feasibility. 

DEFINITION.  As  presently  conceived,  the  GCA  controller  training  system 
design  features  four  major  subsystems.  They  are: 

a.  Adaptive  syllabus  control  subsystem. 

b.  Student  speech  evaluation  subsystem, 

c.  Training  control  subsystem, 

d.  Speech  understanding  subsystem. 

The  following  four  task  descriptions  (Tasks  11  —  14)  include  a  discus¬ 
sion  of  each  of  these  subsystems. 


TASK  11  -  DESIGN  TRAINING  SYLLABUS 


A  preliminary  syllabus  i.s  based  on  the  variables  identified  in  Task  7. 

Of  the  six  variables  listed,  four  have  been  tentatively  selected  for  use  in  the 
demonstration  system.  These  four  represent  factors  significantly  affecting 
task  difficulty  (for  the  controller)  during  the  PAR  approach.  The  other  two 
variables,  task  loading  and  type  of  approach,  concern  subject  variables  and 
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procedures  not  within  the  scope  of  the  demonstration  model  (they  primarily 
increase  difficulty  for  the  pilot  and  actually  simplify  the  problem  for  the 
controller).  These  variables  need  not  be  included  in  the  feasibility 
demonstration. 

ADAPTIVE  VARIABLES.  For  the  demonstration,  three  values  representing 
points  along  a  continuum  of  difficulty  were  identified  for  each  of  the  four 
variables  previously  described.  Each  point  represents  a  level  of  difficulty. 
Further  refinement  of  the  values  will  occur  during  the  early  part  of  the 
demonstration  as  a  function  of  the  flight  equations  mechanized.  Represent¬ 
ative  levels  of  difficulty  for  the  demonstration  are  listed  in  the  following: 

a.  Wind  factors: 

1.  Fifteen-knot  head  wind. 

2.  Fifteen-knot  cross  wind. 

3.  Thirty-knot  (variable)  cross  wind,  terrain  turbulence. 

b.  Aircraft  type.  Three  typical  aircraft  characteristics  reflecting 
FAA  categories  were  identified.  These  are,  based  on  weight  and 
speed: 

1.  Less  than  91  knots  and  30,  000  pounds  (category  A). 

2.  One-hundred  twenty-one  to  141  knots,  60,  000  to  150,  000  pounds 
(category  B). 

3.  One-hundred  sixty-six  knots,  any  weight  (category  E). 

c.  Pilot  response  lag  (must  be  established  after  flight  equations  are 
identified): 

1.  Average. 

2.  Long. 

3.  Short. 

d.  Pilot  variability  (must  be  established  after  flight  equations  are 
identified): 

1.  0.  5 cr  (standard  deviation). 

2.  1.  Ocr. 

3.  2.  0 o" . 
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These  'pilot  variables' reflect  aircraft  control  characteristics  as  well 
as  pilot  control  input.  They  must  be  quantified  and  checked  during  the  early 
part  of  the  demonstration  phaie. 


SYLLABUS.  Table  3  presents  a  feasible  training  syllabus  containing  20  ex¬ 
ercises.  Each  exercise  was  constructed  by  including  one  level  of  each  of 
the  four  adaptive  variables  in  an  order  which  increased  level  of  difficulty. 
The  syllabus  also  reflects  information  obtained  from  GCA  controllers  at 
Miramar  NAS.  These  controllers  indicated  that  wind  and  aircraft  factors 
are  more  salient  than  pilot  factors,  although  all  three  contribute  to  difficulty 
of  the  approach  from  the  controller's  standpoint. 


TABLE  3.  PRELIMINARY  TRAINING  SYLLABUS 


Sequence 

Number 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1  1 

12 

13 

• 

14 

15 

16 

• 

17 

] 

18 

19 

! 

20 

Wind 

Pilot 

Pilot 

Factors 

Response  Lag 

Variability 

1 

1 

1 

1 

1 

2 

1 

1 

3 

1 

2 

2 

1 

2 

3 

2 

1 

2 

2 

1 

3 

2 

2 

3 

2 

2 

2 

2 

3 

3 

2 

3 

2 

3 

1 

3 

3 

2 

1 

3 

2 

1 

3 

3 

2 

3 

3 

2 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

Aircraft 

Type 
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ADAPTIVE  SYLLABUS  CONTROL  SUBSYSTEM  (ASCS).  The  ASCS  is 
presently  conceptualized  as  consisting  of  the  syllabus  and  the  logic  used 
to  select  the  next  training  problem.  The  syllabus  (table  3)  is  arranged  in 
order  of  increasing  difficulty.  The  selection  of  each  succeeding  problem 
will  be  based  on  the  score  attained  on  the  previous  problem.  The  student 
will  be  incremented  more  rapidly  for  good  performance  than  for  average 
performance.  Poor  performance  will  result  in  decrementing  the  level  by 
one  or  more  steps.  The  student  moves  through  the  program  in  accordance 
with  the  logic  shown  in  figure  8  and  graduates  from  it  upon  satisfactory 
completion  of  the  most  difficult  problem.  The  logic  is  similar  to  that  used 
in  a  previous  program  (ATE )!*-).  It  permits  the  student  to  complete  the 
course  in  accordance  with  his  ability. 

TASK  12  -  DESIGN  STUDENT  SPEECH  EVALUATION  SUBSYSTEM 

The  problem  definition  phase  devoted  considerable  effort  to  identifying 
objective  measures  of  student  performance  that  might  be  used  to  reflect 
learning.  Although  present  PAR  controller  performance  measures  are 
largely  subjective,  the  syllabus  used  by  the  Navy  provided  guidelines  for 
developing  objective  measures.  Additional  background  data  for  performance 
measurement  was  collected  at  NAS  Miramar.  The  end  result  was  a  list  of 
nine  measures  which  reflect  the  performance  requirements  discussed  earlier. 
The  measures  include: 

a.  Percent  correct  advisories. 

b.  Ratio  of  glidepath  to  heading  advisories. 

c.  Number  of  errors  in  phraseology. 

d.  Error  of  aircraft  about  glidepath, 

e.  Error  of  aircraft  about  centerline. 

f.  Number  of  advisories  in  each  category  (well  above,  below,  right, 
left,  etc.  ) 

g.  Number  of  1-degree  heading  changes. 

h.  Time  delay  between  advisories. 


Charles,  j.  P.  and  Johnson,  R.  M.  Automated  Training  Evaluation  (ATE). 
Technical  Report:  NAVTRADEVCEN  70-C-0132-1.  January  1972.  Naval 
Training  Equipment  Center.  Orlando,  Florida. 
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Figure  8.  Adaptive  Logic  Flowchart 
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i.  Procedural  errors  or  omissions;  e.  g.  ,  failure  to  issue: 

1.  "Begin  descent.  11 

2.  Reference  to  touchdown. 

3.  Heading  changes  in  5-degree  increments  beyond  3  miles. 

4.  Two  to  5-degree  heading  changes  within  3  miles  of  touchdown. 

5.  Landing  clearance. 

Some  or  all  of  the  nine  measures  will  be  combined  to  produce  a  com¬ 
posite  score  of  PAR  controller  performance  for  use  in  the  adaptive  logic  to 
adjust  problem  difficulty  on  successive  runs.  This  score  is  referred  to  as 
' C'  in  figure  8. 

TASK  13  -  DESIGN  TRAINING  CONTROL  SUBSYSTEM 

The  training  control  subsystem  is  the  vehicle  for  automated  instruction.  As 
such,  it  takes  the  selected  problem  from  the  adaptive  syllabus  control  sub¬ 
system,  initializes  it,  generates  and  controls  the  PAR  display,  converts  the 
SUS  output  to  aircraft  parameters,  implements  adaptive  variables,  and,  in 
general,  controls  the  progress  of  the  task.  Figure  9  portrays  the  block 
diagram  of  these  functions,  Oi  these  functions,  two  require  further  elabora¬ 
tion.  These  are  aircraft  simulation  and  PAR  display  simulation. 

AIRCRAFT  SIMULATION,  Analyses  conducted  revealed  that  only  two  basic 
parameters  of  flight  need  be  controlled  during  PAR:  heading  and  vertical 
speed,  i hus,  sophisticated  flight  equations  need  not  be  developed  to  imple¬ 
ment  heading  and  power  controls  for  the  demonstration.  Simple  transfer 
functions  can  be  used.  However,  these  transfer  functions  must  be  variable 
to  reflect  each  of  the  three  categories  of  aircraft  identified  earlier.  Similar 
transfer  functions  can  be  used  for  wind  and  pilot  factors. 

PAR  DISPLAY.  For  demonstration,  a  general-purpose  CRT  system  can  be 
used  to  simulate  the  display  of  the  GCA  CPN-4  system.  The  PAR  display  is 
actually  two  separate  radar  presentations  displayed  on  one  scope.  The  upper 
portion  is  the  elevation  display  (EL)  while  the  lower  is  the  azimuth  display 
(AZ).  The  scans  are  7  degrees  in  elevation  and  20  degrees  in  azimuth.  The 
target1  or  'pip1  is  about  0.  50-inch  high  in  the  elevation  display  and  0.  25 -inch 
wide  in  the  azimuth  display.  Based  on  information  from  the  Navy  GCA  Syl- 
<aous,  tne  simulated  final  approach  leg  should  be  displayed  with  a  range  of 
5G,  COO  feet  cr  approximately  9.  4  miles  in  length.  The  glide  slope  can  be 
set  at  a  2.  5-csgree  angle  producing  a  glidepath  intercept  altitude  of  about 
2  :b0  feet.  These  values  are  considered  optimal.  The  azimuth  display  has 
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the  same  range  scale  as  the  elevation  display.  Both  displays  should  be 
logarithmic  in  range.  Decision  height  will  be  100  feet  and  the  altitude  over 
landing  threshold  50  feet.  These  points  occur  at  0.  66  and  0.  33  mile,  re¬ 
spectively,  on  the  azimuth  display.  Safety  limits  for  the  elevation  display 
should  be  included  and  can  be  based  on  flat  terrain.  An  example  of  such  a 
display  is  shown  in  figure  10. 

For  the  demonstration,  the  'target1  should  begin  at  a  point  10  miles  from 
touchdown  and  on  runway  heading.  An  altitude  of  2180  feet  can  be  maintained 
until  intercept  of  the  giidepath  has  occurred.  Once  intercept  has  occurred, 
'target'  movement  will  be  a  function  of  giidepath  and  course  advisories. 

TASK  14  -  DESIGN  SPEECH  UNDERSTANDING  SUBSYSTEM 

The  speech  understanding  subsystem  (SUS)  is  the  crux  of  the  GCA  controller 
training  system.  When  operating  properly,  the  SUS  will  accept  one  of  ap¬ 
proximately  80  advisories  every  5  seconds  or  less,  recognize  it,  and  con¬ 
vert  it  to  a  functionally  acceptable  output  (understanding).  The  SUS  may  be 
divided  into  sections  which  perform  specific  functions,  as  shown  in  figure  11. 
The  system  accepts  advisories  from  the  student,  converts  them  to  digital 
form,  and  extracts  features  which  are  used  in  recognition.  These  feature 
strings  are  sent  to  the  central  processor  where  they  are  compared  to  stored 
'training'  phrases.  The  resulting  correlation  of  the  spoken  phrase  to  ail 
stored  phrases  is  formed  into  an  array,  ordered,  and  compared  to  a  thres¬ 
hold.  The  value(s)  above  threshold  is  then  sentthrougn the  recognition  assist¬ 
ance  routine  where  it  is  compared  to  the  expected  vaiue(s)  that  is  computed 
based  on  actual  aircraft  position  and  the  advisory  which  should  be  issued. 

If  there  is  a  match,  a  final  decision  is  made  as  to  the  phrase  content,  and 
the  result  i3  sent  to  the  training  control  system  and  to  the  student  speech 
evaluation  subsystem.  If  the  expected  value (s)  does  not  match  the  recognized 
phrase,  one  of  three  events  may  have  occurred: 

a.  The  phrase  was  spoken  incorrectly  by  the  student. 

b.  The  phrase  was  spoken  correctly,  but  the  SUS  failed  to  recognize 
it  (rejection). 

c.  The  SUS  recognized  the  phrase  incorrectly  (false  recognition). 

The  seriousness  of  these  occurrences  and  their  resolution  must  await 
detail  design  ana  development  effort.  For  example,  the  resolution  of  the 
three  events  will  be  solved  if  high  system  accuracy  is  achieved. 

Figure  1  1  illustrates  the  organization  of  the  SUS.  An  overall  block 
diagram  of  the  GCA  controller  training  system  is  shown  in  figure  12. 
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Figure  10.  Typical  PAR  Display 
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Figure  li.  Speech  Understanding  Subsystem  (SUS) 
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SECTION  V 
DISCUSSION 


The  review  of  the  literature  and  the  existing  developments  has  revealed 
a  rapidly  expanding  speech  recognition  and  under  standing  technology.  e 
effectiveness  of  this  technology  is  magnified  when  the  problem  can  be  con¬ 
strained  by  a  small  vocabulary,  a  rigid  syntax,  and  a  well-defined  stimulus 
situation.  The  PAR  phase  of  GCA  operation  is  particularly  amenable  to  auto¬ 
matic  speech  recognition  in  that  controller  phrases  are  short  in  length,  finite 
in  number,  and  regular  in  syntax.  No  minimal  pairs  were  discovered,  which 
significantly  reduces  the  possibility  of  interword  false  recognition  ^urt^er 
developments  should  increase  the  observed  recognition  accuracy  of  90  per¬ 
cent  to  95  percent  or  greater. 

There  are  still,  however,  several  limitations  on  applications  to  training 
systems.  For  example,  state-of-the-art  speech  understanding  systems  re¬ 
quire  the  following: 

a.  Speech  relatively  free  ol  ‘uh's,  er'e,  ah'e'i  elongated  syllables  such 
as  well -11 -1-1-1;  and  unplanned  pauses. 

b  High  similarity  of  speech  between  system  training  and  system  oper¬ 
ation,  particularly  with  respect  to  stress  and  intonation  of  the 

speaker. 

c.  Discrete  pauses  of  at  least  0.  25  second  between  phrases. 

d.  System  ’pretraining1  for  each  new  speaker. 

e.  Supplementary  software  routines  to  distinguish  machine  errors  from 
student  errors. 

The  first  four  limitations  present  little  problem  for  experienced  control¬ 
lers  since  they  have  been  trained  to  produce  a  consistent  speecn  pattern  with 
a  definite  rhythm.  For  a  feasibility  demonstration,  however,  untrained  sub¬ 
jects  may  be  used.  This  would  require  PAR  vocabulary  training  to  provide 
adequate  knowledge  of  standard  phrases.  A  prompting  routine  to  facilitate 
the  'student  task1  during  the  demonstration  might  be  considered. 

The  problem  of  detecting  the  type  of  error  and  the  resolution  can  prob¬ 
ably  be  solved  during  detailed  design.  The  error.3  of  consequence  are  1)  im¬ 
proper  phraseology  by  the  student,  2)  failure  to  recognize  a  correct  phrase, 
ana  3)  error  in  phrase  recognition.  The  first  error  can  result  in  two  out¬ 
comes.  Ideally,  the  system  will  reject  the  phrase  as  incorrect,  however, 
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if  the  correlation  with  a  stored  phrase  exceeds  threshold,  the  system  will 
falsely  identify  the  incorrect  phrase  and  output  data.  The  probability  of 
this  occurrence  must  be  minimized  for  effective  training. 

The  second  error  (i.  e.  ,  failure  to  recognize  a  correct  phrase)  is  less 
serious  although  scoring  will  be  affected.  The  outcome  in  terms  of  display 
change  will  be  inconsequential  and  may  well  add  realism  to  the  simulation. 


The  third  error,  incorrectly  identifying  a  correct  phrase,  can  be  serious. 
Fortunately,  the  GCA  phraseology  is  highly  redundant,  and  a  great  deal  of 
information  exists  within  the  system  to  validate  or  verify  the  speech  recogni¬ 
tion  output. 


The  redundancy  within  the  phrases  can  reduce  the  speech  to  be  identified 
within  the  advisory  to  one  or  two  words.  For  example,  the  distinction  between 
'well  above  glidepath1  and  'well  below  glidepath1  reduces  to  a  distinction 
between  above  and  below.  Moreover,  machine  errors,  which  stem  from  faulty 
threshold  settings  or  matching  errors,  are  expected  to  be  qualitatively  dif¬ 
ferent  from  student  errors  resulting  from  confusion  and  faulty  judgment.  The 
true  nature  of  these  differences  should  be  established  via  analyses  conducted 
during  the  early  phases  of  demonstration.  Software  routines  can  then  be 
developed  to  resolve  the  problem  based  on  other  information  available  within 
the  system. 
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SECTION  VI 

CONCLUSIONS  AND  RECOMMENDATIONS 


The  key  issue  in  establishing  the  conceptual  feasibility  of  an  automated- 
adaptive  training  system  for  GCA  controller  training  is  whether  or  not  auto¬ 
matic  speech  understanding  technology  has  advanced  sufficiently  to  meet  the 
needs  of  such  a  system.  In  general,  the  study,  after  analyzing  the  require¬ 
ments  and  technology  and  developing  a  feasible  functional  design,  has  reached 
an  affirmative  but  qualified  conclusion.  The  qualifications  are  not  considered 
severe  since  the  data  developed  indicate  that  system  design  solutions  to  the 
limitations  can  probably  be  achieved.  The  first  qualification  involves  the 
speech  understanding  module  itself;  the  3econd  involves  the  development  of 
supporting  software  routines  to  resolve  ambiguity  and  error. 

Existing  and  available  speech  understanding  equipment  is  constrained  to 
a  limited  number  of  2-second  samples  of  speech.  The  small  number  of  sam¬ 
ple  s  problem  can  be  solved  by  expanding  the  system  computer  memory.  The  pat¬ 
terns  for  the  entire  precision  approach  vocabulary,  for  example,  can  probably  be 
handled  by  adding  8K  of  core  to  the  processor.  Expansion  of  the  time  sample 
to  the  required  minimum  4  seconds  (for  PAR  phraseology)  was  also  investigated. 
A  software  change  to  achieve  a  4- second  sample  has  been  successfully  opera¬ 
ted  in  the  laboratory.  Thus,  it  appears  that  the  limitations  (in  terms  of  vocab¬ 
ulary  and  sample  size)  of  existing  equipment  to  meet  the  needs  of  a  GCA  con¬ 
troller  training  system  can  be  technically  solved. 

The  problem  of  ambiguities  and  errors  in  understanding  or  identifying 
GCA  phrases  poses  the  second  qualification.  The  speech  pattern  criteria 
employed  by  all  available  speech  recognition  systems  is  not  optimized  for 
the  GCA  vocabulary.  Filters  and  preprocessing  units  are  not  readily  modi¬ 
fied.  However,  additional  software  routines  can  be  developed  to  compare  or 
correlate  on  selected  features  to  discriminate  the  GCA  vocabulary  as  well 
as  the  wealth  of  redundant  information  available.  Preliminary  analyses  and 
laboratory  investigations  have  been  successful  in  such  modifications.  How¬ 
ever,  detailed  design  and  demonstration  will  be  required  before  an  evaluation 
to  the  GCA  controller  training  system  requirements  can  be  made. 

The  system  approacn  utilized  in  the  study  proved  particularly  effective 
in  focusing  attention  on  the  major  problems  by  isolating  constraints  and  per¬ 
formance  requirements  early  in  the  definition  phase.  These  requirements 
or  objectives  provided  the  criteria  fcr  subsequent  trade-off  analyses  of  speech 
understanding  equipment.  Once  the  trade-cff  analyses  had  identified  feasible 
equipment  capability,  conceptual  design  to  the  identified  performance  require¬ 
ments  was  readily  completed. 
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The  design  analyses  isolated  the  development  problems  or  nsks  The 
problem  areas  which  can  only  be  solved  by  a  prototype  or  breadboard  design 

include: 

a>  Performance  measurement. 

b.  Student  speech  variability. 

c.  Adaptive  logic  criteria. 

d.  Adaptive  variable  levels. 

e.  Student  feedback  and  criteria. 

f.  Acoustic  environment  effects. 

As  can  be  seen,  these  risk  or  problem  areas  all  revolve  around  the  train¬ 
ing  system  implementation  and  require  resolution  before  an  operational  sy  - 
tem  can  be  developed  and  tested.  Independent  or  isolated  studies  of  the  prob¬ 
lems  or  factors  would  not  solve  the  training  system  design  problem  since  the 
problems  are  highly  interactive.  For  example,  ambient  noise  student  speec 
variability,  performance  measurement,  and  recognition  thresholds 
highly  interactive. 

Therefore,  although  feasibility  of  an  automated-adaptive  GCA  precision 
approach  controller  training  system  has  been  established,  it 
that  a  laboratory  version  of  the  conceptual  system  be  developed  and  imple¬ 
mented  to  explore  and  validate  feasible  solutions  to  the  following  problems 

or  risk  areas: 


a.  Recognition  and  discrimination  of  student's  speech  with  the  required 
accuracy  for: 

1.  Controlling  simulation  of  aircraft  and  pilot  during  approach. 

2..  Evaluation  of  student's  control  performance. 

3.  Evaluation  of  student's  knowledge  of  GCA  procedures  and 
vocabulary. 


b.  Processing  of  aircraft/pilot/controller  data  to: 

1.  Validate  and  support  automatic  speech  understanding. 

Z.  Generate  syllabus  difficulty  factors. 

3  Establish  the  magnitude  and  effect  of  student  speech  variability, 
ambient  noise,  and  related  interactions. 
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c  Automated-adaptive  training  system  integration;  1.  e.  ,  explore 
and  solve  system  implementation  problems  before  a  field  evalua 
tion  system  is  developed. 


Thus  in  summary,  while  conceptual  feasibility  is  clearly  indicated, 
development  risks  and  missing  design  data  require  the  development  and 
exercising  of  a  laboratory  prototype  system  to  explore  tne  problems  e 

laboratory  system  should  be  limited  to  precision  approach  training  (as  the 
limiting  condition),  and  simulation  modules  need  only  be  general  to  explore 
the  variables  and  factors  outlined. 
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APPENDIX  A 

PAR  PHRASE  LIST  WITH  CRITERIA 
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♦  Target  refers  to  radar  return  of  controlled  aircraft  at  shown  on  CRT 
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TIME-LINE  ANALYSIS  OF  PAR  CONTROLLER  TASKS 
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TABLE  5.  TYPICAL  TIME  SEQUENCE  OF  PAR  CONTROLLER  OUTPUT 


Cumulated  PAR  Time 
(in  seconds) 

Controller  Output 

00 

Approaching  glidepath;  begin  descent  flight 

05 

Below  glidepath,  coming  up 

07 

Coming  up  and  on  glidepath 

08 

Six  miles  from  touchdown 

12 

On  glidepath,  heading  233 

14 

On  glidepath 

21 

On  glidepath,  heading  233;  the  centerline  is 
left;  correcting 

;  25 

On  glidepath,  5  miles  from  touchdown 

29 

Going  above,  slightly  above  glidepath 

32 

j 

Slightly  above  glidepath 

37 

Slightly  above  glidepath 

39 

Turn  right,  heading  235 

j  42 

Slightly  above  glidepath 

50 

Turn  right,  heading  238;  4  miles  from 
touchdown 

|  53 

i 

Going  above  glidepath 

56 

Turn  right,  heading  242 

60 

Above  glidepath 

63 

Above  glidepath;  heading  242 

70 
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TABLE  5.  TYPICAL  TIME  SEQUENCE  OF  PAR  CONTROLLER  OUTPUT  (Cont) 


Cumulated  PAR  Time 
(in  seconds) 


Controller  Output 


Slightly  above  glidepath,  coming  down 

Slightly  above  glidepath 

Three  miles  from  touchdown;  transmission 
break 

Coming  down  and  on  glidepath;  heading  242, 
on  course 

On  glidepath;  2-1/2  miles  from  touchdown; 
the  wind  is  350°  at  14 

Cleared  to  land,  runway  24,  right. 

Now  on  glidepath 

Turn  left  heading  240 

Now  on  glidepath 

Going  sligh:ly  below  glidepath,  slightly 
below 

Below  glidepath 

Going  further  below  glidepath 

Below  glidepath  and  coming  up 

One  one -half  miles  from  touchdown;  heading 
245  6 

Slightly  below  glidepath 

Slightly  below  glidepath;  turn  right  heading 
248;  the  centerline  is  right 

Coming  up  and  on  glidepath 

One  mile  from  touchdown;  the  centerline  is 
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TABLE  5.  TYPICAL  TIME  SEQUENCE  OF  PAR  CONTROLLER  OUTPUT  (Cont) 


Cumulated  PAR  Time 
(in  seconds) 


Controller  Output 


132 

138 

141 

144 

149 

150 
155 

Total  time  = 


Turn  right  heading  250;  on  glidepath  (con¬ 
tiguous  with  above  phrase) 

On  glidepath;  1/2  mile;  centerline  is  right, 
correcting 

On  glidepath 

On  glidepath;  heading  248;  on  glidepath 
Over  landing  threshold 
Over  touchdown 

Squawk,  standby,  contact  tower  at  315.  6 
2  minutes,  35  seconds. 


NAVTRAEQUIPCEN  73-C-0079-1 


APPENDIX  C 

REVIEW,  SELECTION,  AND  TEST  OF  SPEECH 
RECOGNITION  DEVICES 


NAVTRAEQUIPCEN  73-C-0079-1 


REVIEW  OF  SPEECH  RECOGNITION  DEVICES 

A  review  of  the  literature  in  automatic  speech  recognition  (ASR)  between 
1950  and  the  present  revealed  at  least  20  attempts  to  build  devices  that 
recognize  and/or  understand  human  soeech.  Although  the  majority  of  these 
devices  have  been  built  in  the  United  States,  others  have  been  attempted  in 
the  U.  S.  S.  R.  ,  Great  Britain,  and  the  Scandanaviar.  countries.  Of  those  in 
the  United  States,  the  majority  have  been  built  by  large  corporations  and 
smaller  research  firms  attached  to  universities.  Supporting  research 
funding  has  been  supplied  by  the  Advanced  Research  Projects  Agency  (ARPA) 
of  the  U.  S.  Government.  The  funding  for  devices  has  come  from  a  variety  of 
sources,  including  corporate  development,  government  (Air  Force,  Post 
Office),  and  university  funds. 

The  success  of  the  approaches  varies  with  the  scope  and  approach  taken 
by  the  various  developers.  The  scope  of  the  development  determined  whether 
the  intended  application  required  'isolated  word  recognition*  as  opposed  to 
'continuous  speech'  or  'connected  word'  speech  and  determined  whether  the 
application  required  understanding  in  addition  to  recognition.  Finally,  the 
approach  determined  whether  digital  or  analog  techniques  were  used,  whether 
pattern -matching  or  segmentation  was  employed,  and  whether  the  recognition 
element  was  phonemic  or  feature  oriented.  Of  these  approaches,  most  were 
directed  to  the  isolated  or  connected  word  problem  with  small  vocabularies, 
and  used  the  analog  segmentation  approach.  Only  four  of  the  devices  are 
known  to  be  commercially  available  today.  These  are  made  by: 

a.  Scope  Electronics. 

b.  Threshold  Technology,  Inc, 

c.  Perception  Technology,  Inc. 

d.  Culler -Harris on,  Inc. 

The  first  three  companies  were  visited  and  their  system  discussed  to 
determine  its  capability.  The  fourth  company  presently  produces  only  a 
3ound  analysis  device,  but  it  can  be  used  for  speech  with  some  modification. 
Telephone  discussions  with  their  personnel  indicated  that  work  is  in  progress 
on  a  speech  recognition  device,  but  it  has  not  been  completed  or  tested. 

The  following  is  a  description  of  the  three  devices,  based  on  informa¬ 
tion  from  the  visits,  brochures,  and  reports. 

a.  Threshold  Technology,  Inc.  —  Threshold  Technology,  Inc.  (TTI) 

builds  a  basic  system  named  the  VIP- 100,  which  recognizes 
connected  word  phrases  or  isolated  words  from  a  limited 
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vocabulary  (32  words  or  phrases)  at  95-percent  accuracy.  The 
length  of  phrases  is  limited  to  2.  0  seconds  and  the  vocabulary 
limited  to  32  such  phrases.  These  aspects  of  the  VIP-100  are 
determined  by  software  and  core  memory  available  and  are, 
therefore,  expandable. 

In  terms  of  configuration,  the  VIP- 100  is  composed  of  a  special 
preprocessor  (including  an  analog-to-digital  converter)  and 
feature  extractor.  The  equipment  includes  a  minicomputer 
(Nova  1200),  an  ASR  33  Teletypewriter,  and  a  small  display/ 
control  device  for  system  operation.  The  interface  between  the 
Nova  1200  and  the  preprocess  or /feature  extractor  unit  is  unique 
but  employs  standard  input/output  card  connectors  for  the  Nova 
1200. 

During  operation,  the  VIP- 100  operates  on  speech  by  segmenting 
each  2.  0-second  utterance  into  a  matrix  containing  512  bits  of 
information  (32  features  mapped  onto  16  time  segments)  when 
the  end  of  the  utterance  is  detected.  The  feature  set  contains 
five  broad  class  features  (vowel/ vowel -like,  long  pause,  short 
pause,  unvoiced  noise-like  consonants,  and  bursts)  and  27 
phoneme -like  features.  Of  these  27  phoneme -like  features,  15 
are  vowel  indicants,  thus  stressing  the  importance  of  vowel  detec¬ 
tion.  These  features  are  derived  by  forcing  each  utterance  through 
a  set  of  19  bandpass  filters,  ranging  in  center  frequency  from  260 
Hz  to  7626  Hz.  The  output  of  these  filters  is  full-wave  rectified  and 
logarithmically  compressed  to  provide  a  50  db  dynamic  range  from 
which  ratio  measurement  is  possible,  i  rom  various  combinations 
and  sequences  oi  tnese  outputs,  a  significant  set  of  features  is  derived 
which  is  the  spectral  de rivative  (de/df)  indicative  of  the  overall  spec¬ 
trum  shape.  Measurement  of  the  slopes  of  spectrum  energy  changes 
ensures  detection  of  peaks  which,  in  turn,  serve  to  identify  formants, 
r  ormants  (energy  concentrations  at  particular  frequencies)  are  known 
to  be  crucial  to  vowel  detection  and  discrimination. 

Recognition  is  performed  in  the  Neva  1200  by  comparison  of  real¬ 
time  generated  matrices  to  storeci  matrices  from  a  training 
session.  To  recognize  phrases  or  words,  the  VIP-100  requires 
that  each  utterance  be  spoken  10  consecutive  times.  This  allows 
the  VIP- 1 00  to  develop  a  consistent  matrix  against  which  utterances 
may  be  digitally  compared  for  recognition.  Similarities  and  ais - 
similarities  in  each  comparison  are  appropriately  weighted,  and 
the  not  result  provides  a  weighted  correlation  product.  Correla¬ 
tion  products  are  also  calculated  after  the  input  matrix  has  been 
shifted  ±  1  time  segment.  The  stored  referent  producing  the 
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highest  correlation  when  all  stored  matrices  are  compared  to 
the  input  is  selected  as  the  'recognition'  match.  To  prevent  the 
VIP- 100  from  being  false -alarm  prone,  a  threshold  has  been 
programmed  which,  in  effect,  causes  the  device  to  reject  the 
utterance  if  some  threshold  correlation  value  is  not  met  or 
exceeded. 

Developments  under  way  include  improvement  of  the  feature  set,  in¬ 
crease  in  recognition  resolution  by  combining  the  number  of  samples 
taken,  modifying  the  recognition  technique  to  decrease  dependence  on 
speaker  stress  and  intonation,  and  improving  'training'  techniques. 

b.  Scope,  Inc.  —  Scope,  Inc.  builds  a  basic  system,  named  the  VCS, 
which  also  recognizes  connected  word  or  isolated  word  speech 
from  a  limited  vocabulary  (24  phrases,  each  1.  0  second  in  length) 
with  90-percent  accuracy.  As  with  the  VIP- 100  device,  phrase 
length  and  vocabulary  are  a  function  of  software  and  memory, 
respectively. 

In  terms  of  configuration,  the  VCS  consists  of  a  spectrum  analyzer, 
an  analog -to -digital  converter,  a  hard-wired  processor  (digital), 
memory,  and  an  output  register  device.  A  standard  minicomputer 
could  be  substituted  for  the  digital  processor  and  memory. 

During  operation,  the  spectrum  analyzer  divides  each  utterance 
into  16  frequency  bands  between  200  and  5000  Hz.  The  bands  com¬ 
pose  a  power  spectrum  which  is  a  frequency  x  amplitude  x  time 
representation  of  the  speech  signal.  Samples  from  these  bands 
are  taken  every  1/60  second  and  multip.  xed  onto  a  single  channel 
where  they  are  converted  to  digital  form.  Thus,  the  original 
utterance  arrives  at  the  processor  as  a  string  of  four-bit  binary 
numbers,  each  representing  the  amplitude  of  one  of  the  16 
frequency  bands  at  some  instant  in  time.  In  the  processor,  these 
strings  are  compressed  (normalized)  into  a  fixed  length  (120  bit) 
code  (pattern)  which  represents  the  salient  features  of  the  uttered 
spectrum.  Thus,  there  is  no  segmentation  in  the  VCS.  Formants 
are  detected  by  peak.,  .n  the  spectral  pattern. 

During  recognition,  patterns  generated  by  the  real-time  normali¬ 
zation  routine  are  compared  to  reference  patterns  generated 
during  training.  During  'training,  '  five  voicings  of  each  word  or 
phraaw  of  the  vocabulary  are  compressed  into  120-bit  patterns 
anc.  stored  in  core  memory.  These  120  bits  contain  both  salient 
and  not-so-salient  variations  of  the  five  utterances.  These  stored 
patterns  are  compared  to  the  input  utterance  where  they  are 
matched  bit  by  bit,  summed,  and  compared  to  a  threshold  vsiue. 
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The  highest  output  of  summed  matched  bits  is  accepted  as  the 
recognition  if  it  is  above  threshold;  otherwise,  the  VCS  rejects 
the  utterance  as  unfamiliar. 

•Perception  Technology,  Inc.  —  Perception  Technology,  Inc.  (PTI) 
builds  a  basic  adaptive  speech  recognition  system  which  presently 
recognizes  isolated,  single -syllable  digits  or  connected  strings 
of  digits  up  to  2.  4  seconds  in  length  with  accuracies  between  90 
and  98  percent.  A  positive  feature  of  the  PTI  device  is  its  mini¬ 
mal  speaker  dependence  which  nearly  eliminates  the  need  for 
prior  training.  '  If  needed,  training  consists  of  voicing  six  key 
words  three  times  each.  This  allows  the  adaption  routine  to 

transformations  which  shift  the  speaker's  voice  characteris¬ 
tics  toward  the  pre-established  norm.  Th-s  procedure  accounts 
for  accent,  sex,  and  intonation  differences  observed  between 
speakers. 

In  theory,  the  PTI  concept  differs  markedly  from  other  manufac¬ 
turers,  who  tend  to  emphasize  'formant  theory.  '  The  PTI  approach 
places  speech  theory  in  a  class  with  relativity  and  color  percep¬ 
tion,  stressing  its  polar  and  relativity  features.  The  theory  is 
based  on  the  assumption  that  speech,  like  color,  may  be  placed 
on  a  wheel'  in  which  two  dimensions  ^frequency  and  amplitude) 
are  represented.  Of  specific  interest  in  the  theory  is  pitch 
(frequency)  and  loudness  (amplitude).  The  two-dimensional 
speech  wheel  is  divided  into  patches.  These  patches  represent 
frequency  combinations  of  utterances  in  a  closed  space.  Once 
the  space  is  divided  for  a  certain  vocabulary,  the  locus  (trajectory; 
of  points  corresponding  to  words  is  analyzed  for  patch  'hits.  '  The 
identity  of  the  patches  entered  by  the  trajectory  of  a  word  are 
processed  by  the  decision  algorithm  in  conjunction  with  24  other 
features  such  as  initial  or  final  'S,  '  tg  9  etc.  Word  boundaries 
are  determined  by  pauses  in  a  400-700  Hz  bandpass  filter  or 
suitable  dips  in  the  energy  spectrum.  These  24  features  are 
derived  from  the  output  of  six  bandpass  filters  between  250  and 
5300  Hz. 

ihe  system  is  composed  of  a  PDP-6E  minicomputer  with  SK  of 
memory,  an  ASR-33  Teletypewriter,  and  a  display  unit. 

While  the  PTI  device  is  not  sufficiently  developed  for  the  present 
application,  planned  development  will  bring  it  closer.  Pikns  have 
been  made  by  PTI  to  investigate  the  application  of  this  theory  to 
larger  vocabularies. 
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SELECTION  OF  A  SPEECH  RECOGNITION  DEVICE  FOR  THE  SUS 

Based  primarily  upon  considerations  of  the  capability  of  the  device  to 
perform  the  functions  described  in  this  report,  the  VIP- 100  was  selected 
as  the  tentative  front-end  device  for  the  SUS.  The  selection  of  the  VIP- 100 
was  bolstered  by  new  developments  and  improvements  being  undertaken  by 
Threshold  Technology,  Inc.  The  VCS  (Scope,  Inc.  )  would  require  too  exten¬ 
sive  a  modification  to  perform  the  subject  tasks.  The  PTI  device  is  not  yet 
available. 

VIP-100  TESTS 

A  test  using  selected  PAR  advisories  was  devised  to  check  the  recog¬ 
nition  accuracy  of  the  VIP- 100  system.  ,  The  test  consisted  of  nine  lists  of 
10  items  each  since  the  VIP- 100  system  used  had  storage  for  only  a  vocabu¬ 
lary  of  10  items  of  2  seconds  or  less  in  length.  Five  of  the  nine  lists  con¬ 
sisted  of  phrases,  while  the  remaining  four  lists  were  single  words.  (Refer 
to  table  6.  )  The  first  list  included  glidepath  advisories;  the  second  list, 
trend  advisories;  the  third  list,  course  corrections;  the  fourth  list,  headings 
(numerics);  and  the  fifth  list  had  information  advisories. 

The  VIP- 100  was  trained  on  each  list  by  repeating  each  item  10  times  in 
succession  (100  total  repetitions).  Once  the  device  was  trained  on  a  given 
list,  the  items  in  that  list  were  then  randomly  selected  and  spoken  until  all 
items  had  been  repeated  four  times.  This  procedure  tested  for  intralist 
false  recognitions.  Once  four  repetitions  of  the  trained  lists  had  been  com¬ 
pleted,  each  of  the  other  lists  were  read  once  in  order  in  a  test  for  interlist 
false  recognitions.  This  procedure  was  repeated  for  all  nine  lists.  Errors 
(both  false  recognitions  and  rejections)  were  recorded. 

The  results  revealed  88.4  percent  overall  recognition  accuracy  across 
all  lists;  87  percent  on  intraliat  comparisons  and  89.  9  percent  on  interlist 
comparisons.  Phrases  had  87.  0  percent  and  95.  8  percent  accuracy  for 
ir.traiist  and  interlist  comparisons,  respectively.  Key  words  had  relatively 
poor  recognition  -  79  percent  and  87.  5  percent  for  inter  list  and  intralist 
comparisons.  The  numerals,  however,  were  recognized  with  100  percent 
accuracy.  Heading  (course)  and  glidepath  advisories  (lists  1,  2,  3,  and  4) 
were  discriminated  from  information  advisories  (list  5)  92.  5  percent  of  the 
time.  Of  inter  list  errors,  23  percent  were  due  to  rejections.  (A  rejection 
is  an  event  in  which  the  VIP- 100,  in  effect,  says  'I  don't  recognize  the 
spoken  phrase.  ') 

While  these  scores  are  lower  than  values  required  by  the  SUS,  several 
factors  should  be  considered; 
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a.  The  VIP-100  used  in  the  teats  was  an  'early'  version  made 
available  for  showroom  demonstration.  A  newer  version  with 
increased  capability  is  available. 

b.  The  VIP- 100  is  sensitive  to  changes  in  pitch  and  inflection.  This 
sensitivity  is  reduced  on  newer  versions  by  virtue  of  basic  feature 
changes 

c.  The  VIP- 100  system  tested  utilized  only  16  time  samples.  This  can 
be  increased  to  32. 

d.  The  recognition  system  was  'self- standing.  1  Additional  software 
which  compares  expectancies  with  'observed'  speech  inputs,  for 
example,  can  be  added  for  the  training  system  application. 


TABLE  6.  VIP- 100  TEST  PHRASES 


List  —  I 

List  —  II 

Approaching  glidepath 

And  holding 

Begin  descent 

And  coming  down 

On  glidepath 

And  coming  rapidly  down 

Going  above  glidepath 

And  coming  slowly  down 

Slightly  above  glidepath 

And  going  further  above 

Well  above  glidepath 

And  correcting 

Going  below  glidepath 

And  coming  up 

Slightly  below  glidepath 

And  coming  rapidly  up 

Well  below  glidepath 

And  coming  slowly  up 

Going  rapidly  below  glidepath 

And  going  further  below 
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TABLE  6.  VIP- 100  TEST  PHRASES  (Cont) 


List  -  III 


On  course 

Going  right  of  course 
Going  left  of  course 
Right  of  course 
Left  of  course 
Slightly  right  of  course 
Slightly  left  of  course 
Well  right  of  course 
Well  left  of  course 
Turn  right,  heading 


List  —  ' 


Wind  is  210 
Centerline  is  right 
Cleared  to  land 
Runway  24  right 
At  decision  height 
One  mile  to  touchdown 
Over  approach  lights 
Over  landing  threshold 

i 

Execute  missed  approach 
Contact  tower 


List  -  IV 

Heading  is 

247 

Heading  is 

245 

Heading  is 

249 

Heading  is 

159 

Heading  is 

199 

Heading  is 

195 

Heading  is 

012 

Heading  is 

201 

Heading  is 

102 

Heading  is 

360 

List  -  VI 

Glidepath 

Heading 

Course 
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TABLE  6.  VIP- 100  TEST  PHRASES  (Cont) 


List  -  VII 


Holding 

And 

Coming 

Of 

Going 

From 

Rapidly 

Is 

Slowly 

If 

Furuier 

At 

Correcting 

One -half 

Centerline 

Up 

Approaching 

Down 

Approach 

Half 

List  -  VIII 


List  -  IX 


One 

Two 

Three 

Four 

Five 

Six 

Seven 

Eight 

Nine 

Zero 
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GLOSSARY 


AFT 
ARPA 
A SCS 
ASR 
ATE 
A  7. 

C 

CIC 

CN 

CRT 

D 

EL 

FAA 

GCA 

GCI 

NAS 

NATTC 

NAVTRAEQUIPCEN 

NM 

NTIS 

PAR 

PTI 

R 

b 

\T 


Automated  flight  training 
Advanced  Research  Projects  Agency 
Adaptive  Syllabus  Control  Subsystem 
Automatic  speech  recognition 
Automated  training  evaluation 
Azimuth 
Criteria  value 
Combat  information  center 
Criteria  for  performance 

Cathode  ray  tube 
Difficulty  level 
Elevation 

Federal  Aviation  Agency 
Ground  controlled  approach 
Ground  controlled  intercept 
Naval  Air  Station 

Naval  Air  Technical  Training  Center 
Naval  Training  Equipment  Center 
Nautical  miie(s) 

National  Technical  Information  Service 
Precision  approach  radar 
Perception  Technology,  Inc. 

Run  number 
Score 

Standard  deviation 


sus 

TTI 


Speech  understanding  system 
Threshold  Technology,  Inc. 
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